Software Clustering based on Information Loss Minimization
نویسندگان
چکیده
The majority of the algorithms in the software clustering literature utilize structural information in order to decompose large software systems. Other approaches, such as using £le names or ownership information, have also demonstrated merit. However, there is no intuitive way to combine information obtained from these two different types of techniques. In this paper, we present an approach that combines structural and non-structural information in an integrated fashion. LIMBO is a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system. We apply LIMBO to two large software systems in a number of experiments. The results indicate that this approach produces valid and useful clusterings of large software systems. LIMBO can also be used to evaluate the usefulness of various types of non-structural information to the software clustering process.
منابع مشابه
Clustering Categorical Data based on Information Loss Minimization
As the size of databases continues to grow, understanding their structure gets more difficult. This, together with the lack of documentation and the unavailability of the original designers of the database adds further difficulty to the job of researchers and professionals to understand the structure of large and complex databases. At the same time, data sources are distributed over several sit...
متن کاملOptimal Capacitor Allocation in Radial Distribution Networks for Annual Costs Minimization Using Hybrid PSO and Sequential Power Loss Index Based Method
In the most recent heuristic methods, the high potential buses for capacitor placement are initially identified and ranked using loss sensitivity factors (LSFs) or power loss index (PLI). These factors or indices help to reduce the search space of the optimization procedure, but they may not always indicate the appropriate placement of capacitors. This paper proposes an efficient approach for t...
متن کاملRegularized Co-Clustering on Manifold
Co-clustering is to partition rows and columns of a matrix simultaneously. It has been an important research field in data mining and machine learning. It is preferred over traditional homogeneous clustering techniques in many real applications. In this paper, we present a co-clustering algorithm based on local information and regularization. The algorithm seeks to preserve the local intrinsic ...
متن کاملExperimental Evaluation of Algorithmic Effort Estimation Models using Projects Clustering
One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patter...
متن کاملBilateral Weighted Fuzzy C-Means Clustering
Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003